Parsing Heterogeneous Corpora with a Rich Dependency Grammar
نویسنده
چکیده
Philologist: I need to parse Old French texts of different types (verse, prose, dialects etc.). Do I have to train separate parser models? Computational Linguist: You won’t lose much if you train the parser on all the data you have. P: I can’t do the training myself. What can I expect from existing parser models? C: If the training corpus contained 12th century verse texts, you are best prepared for most flavours of Old French, including prose ––– except for the very oldest texts. P: And if I want to parse very old texts? C: Then the time lapse between your text and the training data should be as small as possible. P: A golden rule to go home with? C: Don’t train on prose if you want to parse verse. P: AOI.
منابع مشابه
تأثیر ساختواژهها در تجزیه وابستگی زبان فارسی
Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...
متن کاملApproaches for Learning Constraint Dependency Grammar from Corpora
This paper evaluates two methods of learning constraint dependency grammars from corpora: one uses the sentences directly and the other uses subgrammar expanded sentences. Learning curves and test set parsing results show that grammars generated directly from sentences have a low degree of parse ambiguity but at a cost of a slow learning rate and less grammar generality. Augmenting these senten...
متن کاملData-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing
We present a comparative study of transition-, graphand PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations....
متن کاملExploiting Language Variants Via Grammar Parsing Having Morphologically Rich Information
In this paper, the development and evaluation of the Urdu parser is presented along with the comparison of existing resources for the language variants Urdu/Hindi. This parser was given a linguistically rich grammar extracted from a treebank. This context free grammar with sufficient encoded information is comparable with the state of the art parsing requirements for morphologically rich and cl...
متن کاملModeling Dependency Grammar with Restricted Constraints
In this paper, parsing with dependency grammar is modeled as a constraint satisfaction problem. A restricted kind of constraints is proposed, which is simple enough to be implemented efficiently, but which is also rich enough to express a wide variety of grammatical well-formedness conditions. We give a number of examples to demonstrate how different kinds of linguistic knowledge can be encoded...
متن کامل